Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sublist Shared Cache Improvements #726

Merged
merged 5 commits into from
Aug 27, 2018
Merged

Sublist Shared Cache Improvements #726

merged 5 commits into from
Aug 27, 2018

Conversation

derekcollison
Copy link
Member

Sublists have a caching layer that is shared. Most of the extreme performance in NATS comes from the client L1 cache of the sublist which has no contention. This works well as long as the sublist has a fairly low rate of change. When highly concurrent reads to the sublist's cache were present performance suffered.

Also #710 showed a case where the shared cache made things worse.

This PR alleviates some of those issues by being smarter when having a literal subject to add or remove from the shared cache.

It also uses Go 1.9 sync.Map which performs better under highly concurrent conditions with many cores.

/cc @nats-io/core

@derekcollison derekcollison requested a review from kozlovic August 26, 2018 00:55
@coveralls
Copy link

coveralls commented Aug 26, 2018

Coverage Status

Coverage increased (+0.1%) to 92.122% when pulling 2c4b7e7 on sublist_cache into 34b556d on master.

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One if the Range() is returning false to stop the iteration and not sure that should be the case.

r := v.(*SublistResult)
if matchLiteral(key, subject) {
s.cache.Store(key, r.addSubToResult(sub))
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return false will stop the iteration. That's not what the original code was doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, will fix. I will add test too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added separate test, but could add this into the larger cache test if you think that is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it into larger cache test and also double checked that it failed without the change.

checkBool(subjectIsLiteral("foo.*.>"), false, t)
checkBool(subjectIsLiteral("foo.*.bar"), false, t)
checkBool(subjectIsLiteral("foo.bar.>"), false, t)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had changes in the not so distant past that made say foo*.bar a literal. You may want to add a test that we indeed consider * and > that are not token of their own not wildcards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Signed-off-by: Derek Collison <[email protected]>
@derekcollison
Copy link
Member Author

Let me make sure we are green then I will merge.

Copy link
Contributor

@gwik gwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this issue.

I think this should clear the contention issues I expressed in #710 for the most part since the write lock is no longer held on matching and the sweeping is async.
However, I still feel both the server's "client" cache and sublist cache are inefficient in this use case and I'd rather disable them entirely.
I will run my experiment again and compare with the no cache version I have and let you know. I will also run the benchmark suite, the contention benchmarks are a good addition :)

I think there may be a race in the cache handling.

delete(s.cache, k)
break
}
s.cache.Store(subject, result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this introduces a race. For example, if a client subscribes between the unlock and the call to s.cache.Store it will be shadowed and won't be delivered messages for this subscription until the cache entry gets cleared.
Also for queue subscriptions a client could have removed a subscription (but still be connected) and be delivered messages while no longer processing it and would "steal" it from other valid subscriptions of the same queue.

The read lock should be held while calling s.cache.Store to ensure that the sublist won't be modified concurrently.
There may be similar issues with Load, although I don't see one at the moment holding the read lock is probably a good idea too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly why I disliked that Go added sync.Map. It "introduces" these kind of issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same class of problems may happen with atomic operations but they are still useful. Concurrency is hard to reason about...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a closer look. Let me know if your production issue resolves with this fix though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok #729 should handle. Thanks for pointing this out, you were correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants